An Eprints Apache Log Filter for Non-Redundant Document Downloads by Browser Agents
نویسنده
چکیده
3 Processing Algorithm 3 3.1 Document File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2 Successful Downloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.3 Removing Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.4 Obtaining Document Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.5 Preparing Output for Web Analysis Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.6 Script I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.6.1 Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.7 Shell Script for Batch Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
منابع مشابه
An Approach to Log Management: Prototyping a Design of Agent for Log Harvesting
This document describes the state of development of agents. Agents capture logs from devices, normalize, reduce and cataloged them by using metadata. Once all these processes are done, they transmit the cataloged data by using Transportation Protocol to a warehouse server. Also an agent use orchestration parameters to transmit modified logs to a data warehouse server. These parameters can be re...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملThe Annotation-enriched non-redundant patent sequence databases
The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Anno...
متن کامل